The dataset was chosen for this “Exploratory Data Analysis and Data Visualisation” coursework is a policing dataset from Dallas, Texas in the year 2016. We have been tasked to provide an analysis for this dataset. It has 47 variables with 2385 observations. It has various variables on officers and their subjects like race, gender, injury, type of force used when arresting, offense, description, etc. This dataset mainly focuses on race as a factor when a different incident occurs. These data were collected from the police departments across the United States.
The methodology of the whole procedure is described here. Firstly, the dataset “37-00049_UOF-P_2016_prepped” loaded and the data has been pre-processed. To pre-process the data, data wrangling and cleaning were needed. After careful observation, many problems with the dataset occurred before me. One of them was many boxes were empty from the record. So, I had to fill those empty boxes with ‘NA’. Then I realized some of the columns have too many ‘NA’ values which could have manipulated the analysis. So I did get rid of those columns which have more than 50% of ‘NA’ values. Later, I also noticed that the first row of the dataset is almost the same as the header row. After finishing the data pre-processing I tried to raise questions and problems through my careful visualization.
When an officer targets a subject in Dallas, Texas, does it matter what the subject’s race is? If we see the histogram above, most of the subject’s race is Black which is 1333 in the count. In the second and third positions, it is Hispanic and White which have less than half the frequency as the black subject. Before interpreting anything, it can also be possible that most of the people in that particular area are Black. But after doing some research, in 2016 Dallas, Texas had a population of 1.3 million. Among the approximately 63% people are White and approximately 24% people are Black. So, it is clear in the graph that the people in Dallas, Texas are facing racial issues when it comes to policing.
Is the police department in Dallas, Texas is male dominant? From the bar plot above, it is clearly shown that there is a huge gap between the male officers and female officers in terms of number in Dallas, Texas. About approximately 90% of police officers are in there is male and approximately 10% of them are female. So, we can tell that the police department in Dallas, Texas is male dominant.
It is shown in this bar chart that, in case of subjects getting arrested whether the subject’s race plays any role or not. Approximately 86% of the black subject get arrested and for White and Hispanic subjects it is approximately 88% and 86%. So, in the case of arresting the subjects, there is no evidence of racial favoritism.
From the density plot, we can state that majority of the police officers in the police department in Dallas, Texas are new with less experience.
From the pie chart, we can see that most of the officers in the Dallas, Texas police department are White. It also could be possible that a big proportion of the population in there are White people.
## [1] FALSE
This is a bar chart of the subject’s gender. Besides 10 null values and 1 unknown value, we can also see here that it has also male dominance. From here we can say that females are less likely to involve in a crime.
In the box plot, it is shown which officers are most likely to arrest their subject when an incident happens, officers with less experience or with more experience. When it comes to arresting subjects, both yes and no has same median value which is 6.
In this raster plot we’re trying to show, which officers are hospitalized experienced once or the less experienced one. From the plot, it’s clear the less experienced officers are more likely to be hospitalized if they are injured in an incident.
From this heat map, I tried to visualize what is different race’s officer’s subject’s races. The highest frequency in this map is when the officer is white and the subject is black. The frequency between them is 846 which is significantly greater than others.
From this time series data, we’re trying to see which time, month, or year of the time is most likely an incident to happen. From the Hours vs Incident Rates graph, we can tell that when it is dark and about to get dark an incident is most likely to happen. The incident count is highest when the time is 20:00. Incident rate decreases after 2 in the morning. In the Months vs Incident rates graph, we can see that, in the first couple of months incident rates slightly increase then the rate goes down till July. The highest incident happens in March and the lowest in December. To the lack of the data Year vs Incidents couldn’t be interpreted properly but we can see that going into a new year which is 2017, the incident rate is slightly decreasing.
In this map, we can see that incident’s location in Dallas, Texas. Incidents happened more or less in all divisions in Dallas, Texas but the most congested area of the incident is in the Central division.
To summarise, we can say from the data analysis that, Dallas, Texas has a lot of racial aspects when it comes to policing. The most challenging part of this analysis was the dataset. Many data were missing from the dataset. Also, there was hardly any numerical value and for this reason, before every plot the data needed to be pre-processed. If the dataset haven’t had the missing values, then the analysis would have been more accurate.